Picture for Haoli Bai

Haoli Bai

FreqKV: Frequency Domain Key-Value Compression for Efficient Context Window Extension

Add code
May 01, 2025
Viaarxiv icon

Quantization Hurts Reasoning? An Empirical Study on Quantized Reasoning Models

Add code
Apr 07, 2025
Viaarxiv icon

WeightedKV: Attention Scores Weighted Key-Value Cache Merging for Large Language Models

Add code
Mar 03, 2025
Viaarxiv icon

TreeKV: Smooth Key-Value Cache Compression with Tree Structures

Add code
Jan 09, 2025
Viaarxiv icon

FlatQuant: Flatness Matters for LLM Quantization

Add code
Oct 12, 2024
Figure 1 for FlatQuant: Flatness Matters for LLM Quantization
Figure 2 for FlatQuant: Flatness Matters for LLM Quantization
Figure 3 for FlatQuant: Flatness Matters for LLM Quantization
Figure 4 for FlatQuant: Flatness Matters for LLM Quantization
Viaarxiv icon

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions

Add code
Sep 26, 2024
Figure 1 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 2 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 3 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Figure 4 for EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Viaarxiv icon

S2D: Sorted Speculative Decoding For More Efficient Deployment of Nested Large Language Models

Add code
Jul 02, 2024
Figure 1 for S2D: Sorted Speculative Decoding For More Efficient Deployment of Nested Large Language Models
Figure 2 for S2D: Sorted Speculative Decoding For More Efficient Deployment of Nested Large Language Models
Figure 3 for S2D: Sorted Speculative Decoding For More Efficient Deployment of Nested Large Language Models
Figure 4 for S2D: Sorted Speculative Decoding For More Efficient Deployment of Nested Large Language Models
Viaarxiv icon

Visually Guided Generative Text-Layout Pre-training for Document Intelligence

Add code
Mar 27, 2024
Figure 1 for Visually Guided Generative Text-Layout Pre-training for Document Intelligence
Figure 2 for Visually Guided Generative Text-Layout Pre-training for Document Intelligence
Figure 3 for Visually Guided Generative Text-Layout Pre-training for Document Intelligence
Figure 4 for Visually Guided Generative Text-Layout Pre-training for Document Intelligence
Viaarxiv icon

MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric

Add code
Mar 12, 2024
Viaarxiv icon

IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact

Add code
Mar 02, 2024
Figure 1 for IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact
Figure 2 for IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact
Figure 3 for IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact
Figure 4 for IntactKV: Improving Large Language Model Quantization by Keeping Pivot Tokens Intact
Viaarxiv icon